Gut microbiota-based machine-learning signature for the diagnosis of alcohol-associated and metabolic dysfunction-associated steatotic liver disease

Alcoholic-associated liver disease (ALD) and metabolic dysfunction-associated steatotic liver disease (MASLD) show a high prevalence rate worldwide. As gut microbiota represents current state of ALD and MASLD via gut-liver axis, typical characteristics of gut microbiota can be used as a potential diagnostic marker in ALD and MASLD. Machine learning (ML) algorithms improve diagnostic performance in various diseases. Using gut microbiota-based ML algorithms, we evaluated the diagnostic index for ALD and MASLD. Fecal 16S rRNA sequencing data of 263 ALD (control, elevated liver enzyme [ELE], cirrhosis, and hepatocellular carcinoma [HCC]) and 201 MASLD (control and ELE) subjects were collected. For external validation, 126 ALD and 84 MASLD subjects were recruited. Four supervised ML algorithms (support vector machine, random forest, multilevel perceptron, and convolutional neural network) were used for classification with 20, 40, 60, and 80 features, in which three nonsupervised ML algorithms (independent component analysis, principal component analysis, linear discriminant analysis, and random projection) were used for feature reduction. A total of 52 combinations of ML algorithms for each pair of subgroups were performed with 60 hyperparameter variations and Stratified ShuffleSplit tenfold cross validation. The ML models of the convolutional neural network combined with principal component analysis achieved areas under the receiver operating characteristic curve (AUCs) > 0.90. In ALD, the diagnostic AUC values of the ML strategy (vs. control) were 0.94, 0.97, and 0.96 for ELE, cirrhosis, and liver cancer, respectively. The AUC value (vs. control) for MASLD (ELE) was 0.93. In the external validation, the AUC values of ALD and MASLD (vs control) were > 0.90 and 0.88, respectively. The gut microbiota-based ML strategy can be used for the diagnosis of ALD and MASLD. ClinicalTrials.gov NCT04339725


Microbial differences in alcohol-associated liver disease
The control group revealed higher Shannon index scores than the elevated liver enzyme (ELE), cirrhosis, and liver cancer groups (p < 0.01).No difference in Shannon's index was found among ELE, cirrhosis, and liver cancer.This result is also shown in the inverse Simpson index and Plelou evenness (Fig. S1A).The Chao1 index of the control and ELE groups was higher than that of the alcoholic cirrhosis and liver cancer groups (p < 0.01).The ELE group showed a difference compared with the cirrhosis and liver cancer groups in the Chao1 index (p < 0.01) (Fig. 3A).www.nature.com/scientificreports/ In the β diversity analysis and composition of phyla, each group showed differences (Fig. 3B,C).β diversity ordination using the Aitchison distance: applying PCA to the centered log-ratio (CLR) transformed counts.By www.nature.com/scientificreports/using unweighted UniFrac distance, PCoA analysis indicated the different community compositions at the OTU level between the control and the liver fibrosis groups with variances of PC1 1.6% and PC2 2.4%.There was a difference in the centroid location between groups (Pr [> F] = 0.001) (Fig. S1B).LEfSe analysis was performed to identify the distinct bacterial species between the control and ALD groups with an LAD score > 2 (Table S2 and S4, Fig. 3D,E).

Microbial differences in metabolic dysfunction-associated steatotic liver disease
No differences in Shannon (p = 0.339) or inverse Simpson (p = 0.401) indices were found between the control and ELE groups (Fig. 4A and Fig. S2A).The Chao1 index of the α diversity in the control group was lower than that in the ELE group (p = 0.002).PCoA indicated the different community compositions at the OTU level between the control group and the liver fibrosis group, with variances of PC1 of 1.9% and PC2 of 1.7% (Fig. 4B and Fig. S2B).
In the control group, a lower abundance of Bacteroidia and Betaproteobacteria was found at the class level (p < 0.05).At the order level, Desulfovibrionales decreased significantly in the control group (p < 0.05).At the family level, a lower abundance of Bacteroidaceae was identified, and Bacteroides decreased significantly at the genus level (p < 0.05).At the species level, lower abundances of Hamsteri, Clostridium, and D168 were found in the control group (p < 0.05) (Fig. 4C-F).Circos representation of the most abundant bacterial genera between the normal and ELE groups in MASLD datasets.Bacterial genera with LDA scores above 2.0 for all bacterial sequences were plotted.(Table S3, S4 and Fig. S2C).

Diagnostic value of supervised machine learning models and external validation
Table 1 presents performance measures of the four different ML algorithms evaluated on each of the testing datasets of ALD and MASLD.Here, the 40 reduced feature dimensions are considered for both the ALD and MASLD groups.CNN in classification combined with PCA in feature reduction performed better than other models.They achieved AUC values ranging between 0.92 and 0.96 for ALD datasets and an AUC of ≈ 0.96 for an MASLD dataset.Tables present performance measures of the CNN model for four different numbers (20, 40,  60 and 80) of feature dimensions reduced by the PCA model for the ALD and MASLD datasets (Tables S4 and  S5, respectively) as well as the architecture of the CNN model utilized in this study (Table S6).
With the ALD datasets, precision, recall, and accuracy were in the range of 0.89 to 0.96, 0.94 to 0.99, and 0.94 to 0.98, respectively, for the 20 reduced feature dimensions; in the range of 0.91 to 0.96, 0.95 to 0.99, and 0.95 to 0.97, respectively, for the 40 reduced feature dimensions; in the range of 0.88 to 0.96, 0.94 to 0.98, and 0.94 to 0.98, respectively, for the 60 reduced feature dimensions; and in the range of 0.88 to 0.97, 0.94 to 0.99, and 0.93 to 0.98, respectively, for the 80 reduced feature dimensions (Fig. 5A).With the MASLD datasets of the normal and ELE groups, precision, recall, and accuracy were 0.94, 0.93, and 0.92, respectively, for the 20 reduced feature dimensions; 0.95, 0.95, and 0.94, respectively, for the 40 reduced feature dimensions; 0.97, 0.95, and 0.95, respectively, for the 60 reduced feature dimensions; and 0.98, 0.96, and 0.96, respectively, for the 80 reduced feature dimensions (Fig. 5B).The CNN model trained with the 40 reduced feature dimensions had slightly higher sensitivity and specificity, showing good diagnostic classification power for predicting and identifying the disease groups.
In the external validation, the AUC values for ALD and MASLD (vs control) were > 0.90 and 0.88, respectively, considering the higher values between the combinations CNN/PCA and CNN/RP (Fig. 5C).

Discussion
In LD, artificial intelligence has been applied for detecting fibrosis, differentiating liver mass, predicting the prognosis of chronic LD, and diagnosing MASLD 21 .In a previous report, the development and verification of ML and artificial intelligence using gut microbiome data for cancer treatment and diagnosis was evaluated 22 .Loomba et al. provided a random forest classifier mode that showed excellent diagnostic accuracy in detecting advanced fibrosis in MASLD (AUC 0.936) 23 .In our clinical data for ALD and MASLD, ML models of convolutional neural networks combined with principal component analysis with 40 taxonomic features achieved > 0.90 in the AUC scores on the 8 paired groups, showing the potential of using ML for predictive diagnosis.Compared with other studies utilizing the microbiome, our current data first link metagenomic features with ALD and MASLD, which leads to the discovery of a potential ML model for diagnosing ALD and MASLD.www.nature.com/scientificreports/ In the external validation results using patient data from different regions, the AUC was over 0.90 for ALD and 0.88 for MASLD.These results show that our ML strategy can be useful for diagnosing ALD and MASLD.As typical microbiota signatures affect the development and progression of human diseases, demonstration of the relationship between the gut microbiome and disease characteristics might be the cornerstone in future human health care 24,25 .With advances in microbiome-related technologies and personalized medicine, the vast amount of data and the complexity of the data limit statistical analysis and predictive potential.Considering the diversity of gut microbiota due to multiple factors, universally applicable microbiota-based metagenomic signatures are not known 26 .Recently, ML and artificial intelligence technology have been actively applied to the analysis of large-scale health care information and the utilization of microbiome-based big datasets, and application cases are being reported in various diseases 27 .
In the ALD, specific AUCs for the diagnosis of liver cirrhosis reached 0.97, indicating that alcoholic cirrhosis patients showed a typical dysbiosis pattern compared with the pattern of the control or ELE group.Anaerosinus, Glycerini, Coriobacteriia, Granulicatella, Balaenopterae, Collinsella, Collinsella, Coriobacteriales, Paraplantarum, Aerococcaceae, Spiroforme, and Bolteae were abundant genera in our study.In a previous report, the abundance of Bacteroides, Escherichia, Shigella, and Prevotella was closely related to portal hypertension in cirrhosis patients 28 .Based on previous research, Roseburia and Faecalibacterium prausnitzii were regarded as good functioning strains.In this study, the normal control group revealed enrichment of Roseburia and Faecalibacterium.Taken together, the results indicate that the ML method using the composition of the macrobiome can be an efficient modality in the diagnosis of LD, especially alcoholic cirrhosis.
The investigators demonstrated that ALD is associated with disruptions in the gut microbiome.Bacteroidetes abundance was decreased in the heavy drinking control 29 .In another study, alcoholic ELE patients showed decreases in Verrucomicrobia, Akkermansia, and Bacteroides 30 .In this study, α and β diversity decreased according to disease progression.The Proteobacteria composition was increased in the cirrhosis group.The abundances of Bacteroidaceae, Bacteroides, Equi, Butyricimonas, Erysipelotrichi, and Erysipelotrichales were elevated in the ELE group.In addition, each group revealed different compositions of microbiota in our results.In ALD, the composition of the microbiota shows specific findings for each disease.
In our results, the diagnostic AUC of microbiota-based ML for Metabolic dysfunction-associated steatohepatitis (MASH) was 0.93 ± 0.11.In other reports, the decision-tree algorithm of the Canadian dataset diagnosed MASLD with 76% accuracy and an AUROC of 0.73 31 .Another study revealed that the AUROC of MASH was 0.83 to 0.88 in a large US population 32 .Considering that our data showed a higher diagnosis rate than other data using artificial intelligence utilizing clinical data, ML analysis of the intestinal microbiota showed a higher In this study, the CNN model trained with the 40 reduced feature dimensions had slightly higher sensitivity and specificity, showing good diagnostic classification power for predicting and identifying the disease groups.The reason might be that the architecture of the CNN used in this study is more suitable for capturing patterns within individual OUT sequences or understanding sequential relationships than other models such as RF or MLP.In a previous study, this gradient-boosting machine algorithm provided the best prediction of liver cancer risk in patients with virus infection 33 .Considering that ALD and MASLD are major public health problem and that approximately 50% of cirrhosis cases are related to ALD and MASLD in Asia 34 , it is necessary to develop various diagnostic technologies, including big data or ML techniques, that go beyond statistics.
We used the 16S rRNA method rather than the shotgun method because it is an inexpensive and easy method for rapid diagnosis and easy clinical use.In terms of diagnostic accuracy, the 16S rRNA-based ML strategy showed a high score and showed the basics of personal medicine in real time.Recently, multiomics-based analysis has been applied for the diagnosis, treatment, and prediction of various diseases 35 .The ML method using images, pathological tissues, and clinical results will be used in various fields of LD in the future.In South Korea, the prevalence of MASLD-related cirrhosis and liver cancer is low.Therefore, ML analysis for MASLD-related cirrhosis and liver cancer was not performed in this study.
MASLD is defined by using cardiometabolic markers.However, all patients did not perform liver biopsy for the diagnosis of MASH.To reduce selection bias, we used ELE group.Since MASLD diagnosis can be easily used clinically without a biopsy, it is expected to increase accessibility to the use of AI in the future.In South Korea, there are few cases of advanced liver disease (liver cirrhosis or HCC) associated with MASLD.We did not enrolled MASLD-related cirrhosis and HCC.Regarding elevated alpha diversities in control group compared with ELE group in MASLD, there are cases where the distribution of microbiotas changes and diversity increases due to liver disease.

Conclusion
This study provides scientific evidence to support the excellent diagnostic accuracy of the microbiota for ALD and MASLD, suggesting holistic insight for further research.The CNN model trained with the 40 reduced feature dimensions had slightly higher sensitivity and specificity.This work developed an excellent microbiota-based ML strategy for the diagnosis of ALD and MASLD.Along with the development of personalized medicine, diagnostic technology using big data and genetic information will replace imaging or liver biopsy.The intestinal microbiota, which reflects an individual's health status, has a close relationship with LDs.Microbiota-based ML strategies can be used to diagnose ALD and MASLD to achieve personalized treatment and prevention of side effects.In the future, based on the microbiota-based ML strategy, we expect to develop a ML method for the treatment and follow-up.

Study design and participants
ALD was diagnosed on the results of alcohol history, liver biopsy, blood chemistry, or imaging study (ultrasound or computed tomography scan).The ALD group was subgrouped by control, ELE, cirrhosis, and liver cancer.Alcoholic ELE patients were defined if they had abnormal liver enzymes [aspartate aminotransferase (AST) ≥ 50 IU/L, AST/alanine aminotransferase (ALT) > 1.5, and AST and ALT < 400 IU/L] and excessive alcohol consumption (male > 60 g/day and female > 40 g/day) with last alcohol drink within 8 weeks of jaundice onset (bilirubin > 3 mg/dL).The control group was recruited from a medical check-up center.Patients meeting the following criteria were excluded: age > 70 years, hepatitis A, B, C, and E virus-related cirrhosis, HIV infection, Wilson's disease, biliary obstruction, sepsis, drug-induced liver injury, autoimmune LD, history of high-dose steroid or antibiotics, presence of liver tumor or history of other cancer, or pregnancy.
MASLD was diagnosed on the definition as described in the consensus statement on new fatty liver disease nomenclature 36,37 .Cardiometabolic criteria including body mass index, fasting glucose, medication history, blood pressure, or cholesterol level were used for the diagnosis of MASLD.Normal control group did not have cardiometabolic criteria.Patients with elevated liver enzymes [AST or ALT ≥ 50 IU/L] or hepatitis on liver pathology were included in the ELE group.They did not drink excessive alcohol (male > 210 g/week and female > 140 g/ week).Patients with autoimmune LD, alcohol use disorder, pancreatitis, hemochromatosis, viral LD, pregnancy, Wilson's disease, drug-induced liver injury, and other cancers were excluded.
Cirrhosis was diagnosed based on the presence of complications (varix, ascites, and encephalopathy), blood tests, imaging findings, fibroscan, or pathological liver results.Liver cancer was diagnosed by two or more imaging tests, such as computer tomography, magnetic resonance imaging, angiography or contrast ultrasound.In addition, subjects taking drugs that affect the gut microbiota were excluded at enrollment.For the control group, we included healthy subjects who visited the center for a health check-up.
Baseline studies included family history, diet pattern, alcohol history, abdominal ultrasound and computed tomography scan, X-ray, electrocardiography, complete blood count, electrolytes, liver function test, viral markers, and Child-Pugh score.Blood analysis was performed using standard methodologies.Serum biochemical parameters included AST, ALT, albumin, bilirubin, alkaline phosphatase (ALP), gamma glutamyl transpeptidase (GGT), blood urea nitrogen, creatinine, international normalized ratio, α-fetoprotein, carcinoembryonic antigen, prothrombin time, blood glucose, and total cholesterol.The levels of hepatitis A, B, and C and other virus markers were evaluated.Antinuclear antibody, antimitochondrial antibody, and antismooth muscle antibody tests were also performed.

Stool sample and processing
Sequencing was carried out according to the manufacturer's instructions at Chunlab, Inc. (Seoul, Republic of Korea) with the Illumina MiSeq platform using reagent kit V3 in PE 250 bp mode.Microbiome taxonomic profiling was conducted with the EZBioCloud platform (ChunLab Inc., Republic of Korea) using the database version PKSSU4.0.
Human feces were stored at − 20 °C as soon as the patient received 2-3 g of feces using the kit (stool paper and stool box) and moved to − 80 °C within 1 day.Genomic DNA for metagenomic sequencing was extracted with a QIAamp stool kit (Qiagen, Hilden, Germany), and the library was prepared with a NEBNext Ultra II FS DNA Library Prep Kit for Illumina (New England BioLabs, Ipswish, MA, USA) according to the manufacturer's directions.The quantification of libraries was checked using a Qubit dsDNA HS assay kit (Thermo Fisher Scientific, Waltham, MA, USA) and confirmed by quantitative polymerase chain reaction (qPCR) with a KAPA SYBR FAST qPCR Master Mix kit (Kapa Biosystems, Wilmington, MA, USA).The quality of the libraries was assessed on a Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA) using a DNA 12,000 chip.All libraries were sequenced on the NovaSeq 6000 platform (Illumina, USA) with paired-end 150 bp reads.
The analysis was performed following our previous reference 38 .In brief, DNA was extracted with a QIAamp stool kit, and amplification of the V3-V4 region of the bacterial 16S rRNA gene was conducted using barcoded fusion primers.The forward fusion primer contained the p5 adapter, i5 index, and gene-specific primer 341F (5′-AAT GAT ACG GCG ACC ACC GAG ATC TACAC-XXXXXXXX-TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG -CCT ACG GGNGGC WGC AG3′; underlining indicates the target region primer and X indicates the barcode region), and the reverse fusion primer contained the p7 adapter, i7 index, and gene-specific primer 805R (5′-CAA GCA GAA GAC GGC ATA CGA GAT XXXXXXXXGTC TCG TGG GCT CGG AGA TGTG TAT AAG AGA CAG -GAC TAC HVGGG TAT CTA ATC C-3′), which included sequencing adapters and dual-index barcodes of the Nextera XT kit (Illumina, San Diego, CA, USA).The amplification was performed in the C1000 touch thermal cycler PCR system (Bio-Rad Laboratories, Inc., Hercules, CA, USA) with the following conditions: initial denaturation of 3 min at 95 °C; followed by 25 cycles of denaturation at 95 °C for 30 s, annealing at 55 °C for 30 s, extension at 72 °C for 30 s and final extension at 72 °C for 5 min.Each amplified PCR product was confirmed with 1% agarose gel electrophoresis and visualized on a Gel Doc XR + imaging system (Bio-Rad Laboratories, Inc., USA).The amplified products were purified and size-selected by Agencourt AMPure XP beads (Beckman Coulter, Chaska, MN, USA).The library was constructed with pooled PCR products, and the quality of the library was assessed on a Bioanalyzer 2100 (Agilent, USA) using a DNA 12,000 chip and quantified by qPCR with a KAPA SYBR FAST qPCR Master Mix kit (Kapa Biosystems, USA).

Sequence and statistical analysis
The sequencing data were processed using the Quantitative Insights Into Microbial Ecology (QIIME version 2).The low-quality sequence reads were removed following the criteria: (1) reads with a length of < 150 bp, (2) reads with an average Phred score of < 20, (3) reads containing ambiguous bases, and (4) reads containing mononucleotide repeats of > 8 bp.High-quality reads were clustered into 16S rRNA operational taxonomic units (OTUs) with ≥ 97% sequence homology 39 .The taxonomic classification of each OTU was performed with VSEARCH by comparing the representative sequence set against the SILVA reference database 40 .
After filtering samples with read counts greater than 500, the reads were grouped at the phylum level (using phyloseq), and the relative abundance was estimated at the phylum level by groups.There was a total of seven phyla.The taxa were sorted by abundance to improve the visualization and then plotted on box plots according to group and faceted by phylum using the raw counts.Many samples had a high number of Bacteroidetes, followed by Firmicutes and Proteobacteria.Most samples had low read counts for other phyla, with some outlying samples.To formally test for a difference in the phylum-level abundance, a multivariate test for differences in the overall composition between groups of samples was conducted by using the HMP package; herein, a Dirichletmultinomial distribution is assumed for the data, and null hypothesis testing is conducted by testing for a difference in the location (mean distribution of each of the taxa) across groups accounting for the overdispersion in the count data.
Taxon abundance at the phylum, class, order, family, genus, and species levels was calculated and statistically compared among groups using the R stats package.Based on the tables generated in QIIME, alpha diversities, including Chao1, Simpson, and Shannon, were calculated.The significant differences in alpha diversity metrics were determined using the R package "Vegan".To investigate the structural variation in microbial communities, beta diversity analysis was performed using UniFrac distance metrics 24 and was visualized via principal component analysis (PCA), principal coordinate analysis (PCOA), and nonmetric multidimensional scaling (NMDS).We checked the separation between other disease groups and normal group samples, suggesting some differences in the communities according to sample type.We tested whether the samples clustered beyond that expected by sampling variability using ADONIS.
Vol:.( 1234567890 The abundance of microorganisms at the genus level was used as a feature.The abundance was normalized by applying a centered log ratio (clr) transformation.Three different nonsupervised ML algorithms for feature reduction were trained with the features of the normalized OTUs on the platform: independent component analysis (ICA), principal component analysis (PCA), and random projection (RP), considering 20, 40, 60, and 80 reduced features.For each of these numbers on the platform, four different supervised ML algorithms for classification were trained using the reduced features: support vector machine (SVM), random forest (RF), multilevel perceptron (MLP), and convolutional neural network (CNN) (Table 2 and Table S4).To evaluate the performance concerning model architectures for predictive classification and diagnostics of LDs, SVM, RF, MP, and CNN algorithms were constructed using various numbers of features reduced from the three nonsupervised ML algorithms.Additionally, four different supervised ML algorithms for classification were also trained with features having LDA score greater than 2.0.Data were assigned into training (70%) and testing internal validation (30%) datasets.The training performance of the different ML models was evaluated using Stratified ShuffleSplit tenfold cross-validation 41 , and the process was repeated 10 times.Hyperparameter tuning was automatically executed by caret, testing 60 different values for each hyperparameter.A total of 52 combinations of MLs (ML for feature reduction with an ML for classification) for each of four different numbers (20, 40, 60 and 80) of reduced features were performed.The process was optimized using 60 hyperparameter variations and evaluated by Stratified ShuffleSplit tenfold cross validation.In the testing internal validation phase, the prediction performance of each combination of ML models was assessed using parameters including the area under the receiver operating characteristic curve (AUC), accuracy, recall, precision, and F1 score.Box plot representations of the AUC, accuracy, recall, precision, and F1 score values were generated using the ggplot2 package in R. The entire process was repeated for each pair of subgroups (7 ALD and 1 MASLD) within each group.
The computing machine we used for timestamped runs is on Ubuntu 18.04 and is equipped with an Intel Core i9-9820X CPU (10 cores), 64 GB memory, and an NVIDIA GTX 1080 Ti GPU.The scikit-learn Python library was utilized for ICA, PCA, RP, SVM, and RF.Additionally, the Keras Python library was employed for MLP and CNN. https://doi.org/10.1038/s41598-024-60768-2

Figure 4 .
Figure 4. (A) Differences in metabolic dysfunction-associated steatotic liver disease group.(B) α diversity.β diversity.(C) Composition of phylum.(D) Taxonomical features with a LDA score > 2.0 were plotted with cladogram and (E) LEfSe bar graph for each group.(F) Heatmap for different genus and species.ELE elevated liver enzyme.

Figure 5 .
Figure 5. (A) Diagnostic values of machine learning strategy in liver disease.Alcoholic liver disease.(B) Metabolic dysfunction-associated steatotic liver disease.(C) External validation.ELE elevated liver enzyme; HCC hepatocellular carcinoma.

Table 1 .
Performance measures of different machine learning algorithms.ELE elevated liver enzyme; HCC hepatocellular carcinoma; ICA independent component analysis; PCA principal component analysis; RP random projection; SVM support vector machine; RF random forest; MLP multilevel perceptron; CNN convolutional neural network.Forty reduced feature dimensions are considered from each of ICA, PCA, and RP.